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The problem 

You need to remove sensitive elements 
of a PDF document for public release 

are they actually removed ? 
can someone reveal your secrets ? 



PDF Redacting Failure 

I wasn't going to even bother writing about this, but I got too many 
e-mails from people. 

We all know that masking over the text of a PDF document doesn't 
actually erase the underlying text, right? 

Don't we? 

Seems like we don't. 



Italian media have published classified sections of an official 
US military inquiry into the accidental killing of an Italian 
agent in Baghdad. 

A Greek medical student at Bologna University who was 
surfing the web early on Sunday found that with two simple 
clicks of his computer mouse he could restore censored 
portions of the report. 

Tags: Adobe , Italy , redaction , secrecy 

Posted on May 3, 2005 at 9:1 1 AM • 24 Comments 

https://www.schneier.com/bloq/archives/2005/05/pdf radactinq f.html 

It's not a new fact 



There are 
plenty of real 
examples 



You just need to: 

1 . uncompress the PDF 

2. remove all " re\n" occurences 

("re" = rectangle operator) 



UNCLASSIFIED 



III. TRAFFIC CONTROL POINTS, BLOCKING POSITIONS, AND TRAINING 

A. (U) Introduction 

(U) This section examines TCPs, BPs, and training mailers. It first discusses the 
difference between a TCP and a BP. Standing Operating Procedures (SOPs) for the 
various units involved regarding TCPs and BPs are assessed, and the Rhino Bus TTP is 
outlined. This is followed by a review of the training on TCPs, BPs, weapons, and Rules 
of Engagement (ROH) that the Soldiers manning BP 54I had received before 4 Mareh 
2005. The ROH that were in effect that night are explained. The section concludes with 
findings and recommendations. 



B. (I)) Traffic Control Points and Blocking Positions 




C. (U) Standing Operating Procedures in use on 4 March 201)5- 

(U) SOPs are designed to serve as guidelines for specific operations and are not 
prescriptive in nature. They provide a baseline for acceptable operations from which 
commanders can derive principles and techniques and adapt them to their current 
mission. (Annexes 44C, 65C, 72C, 96C, 98C). 
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UNCLASSIFIED 



http://download.repubblica.it/pdf/rapportousacalipari.pdf 
seen in its metadata: "EmailSubject (Another Redact Job For You)" 



the topic wasn't really 
covered technically 

AFAIK 



The reverse problem 

You need to carry a sensible PDF, 
or exfiltrate some information: 



Can you convincingly pretend 
that it was a mistake, 
and yet easily re-enable the contents? 



...and, more importantly... 

it still makes it an interesting exercise 

to learn and experiment with PDF internals © 



.and it might also be useful for a CTF steganography challenge. 



it's about hiding 
parts of the PDF document 

not hiding data in a PDF file 
+ nothing reader-specific 



General outline of this talk 



3 relatively independent parts: 

1. a non-technical approach 

2. a basic introduction to the PDF file format 

3. a technical perspective 



a non-technical approach 

Part I /III 



What about that NSA doc ? 

there is an NSA document on the topic, 
worth a read, but Adobe Acrobat (Pro) only 
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http://www.nsa.gov/ia/ files/app/pdf risks.pdf 



Preamble 

this presentation has a lot of hands-on examples, 
that you can find at: 



http ://pdf . corkam i . com 



Outline 



1. the problem (introduction) 

2. outline 

a. see Google "recursion" © 

3. examples 

a. color 

i. forgotten text 

b. overlapped text 

c. secured documents 

i. bypassing security 

d. overlapped image 

i. extracting image 

4. Conclusion 



So, you tried to hide 
elements in a PDF... 



"well, I don't see them anymore" 

try with the next slide: 
nothing is visible... and yet... 

1 . "Select AN" text with your favorite PDF viewer 

2. Copy and paste in a text editor 



Example: color 



PDF Secrets.pdf - Adobe Reader 



View Window Help 



Undo 


Ctrl+Z 


;> Redo 


Shift+Ctrl+Z 


Cut 


Ctrl+X 


1 (^) Copy 


Ctrl+C 


1 |J| Paste 


Ctrl+V 


Delete 




Select All , 


■ Ctrl + A 


Deselect All 


^.hift-i-Ctrl+A 


Copy File to Clipboard 


1 [fsB! Take a Snapshot 


Check Spelling 


► 




Look Up Selected Word.., 




Q» Find 


Ctrl + F 


Advanced Search 


Shift+Ctrl+F 


Protection 


► 


Analysis 




Accessibility 




Preferences.,, 


Ctrl+K 



Example 1 




hint 



It worked, right? 

you can't see the text, 
but it's still on the page 

— ► the software can select it 



■■■■■■■■■■ . 

_] Untitled - Notepad 
File Edit Format Vi 

Example: color 
hidden via 
white color 

4 



Btw 



this can lead to unexpected results, 
so be careful before publishing slides, 
even if you think you have nothing to remove 



try with next slide © 



Example: forgotten text 



HyperVortex 1 .0 
a publication software 

Roberto Martinez 



Oops 



maybe it wasn't a secret to be removed, 
but's still there! 



put extra hidden content for easier indexing 



god, I hate making slides!!! 

Example: forgotten text 

HyperVortex 1 .0 

a publication software 

title 

Roberto Martinez 
authors 

insert stupid footer here ~ LaTeX sucks!!! 



Another try 

Try to get the secret from the next slide, 
with the same copy-paste trick... 



Example: overlapped text 



CONFIDENTIAL 



Once again 



the text is behind the "CONFIDENTIAL" shape, 
but it's still there! 
the software selects everything 
(not only the front layer) 



■■■■■■■v . 

J Untitled - Notepad 



File Edit Format View 



Example: overlapped text 
CONFIDENTIAL 
hidden via 
overlapping shape 



4 



Better than "Select all" 



pdf totext does it for you 

instantly see which text is still hidden 



D: \>pdf totext -layout -1 1 "PDF Secrets.pdf" 
Syntax Warning (631) : Badly formatted number 

D: \> 





■■■■■■■■■■ . 

J Untitled - Notepad 






File Edit Format View 






PDF 
secrets 
hiding & revealing 
secrets in PDF documents 




4 







But PDF can prevent that? 



yes, in theory 

but the text is still there, and decrypted 
it can be circumvented 



Example - overlapped & protected text. pdf (SECURED) - Adobe Reader 



File Edit View Window Help 



CDS 



/ 1 



29.3% 



Tools 



Sign 



Comment 



Security Settings 



□ 



D 



This document has an open 
password or a modify password. 

You cannot copy this document. 
Permission Details 



Example - overlapped & protected text 



CONFIDENTIAL 



Bypassing copy/paste protection 



either: 

• some readers just ignore it 

o like Evince 

• generate a new file out of the original one 

o print PDF as PDF 

(not 100% compatible, but fast and usually works) 
o decrypt 



D:\>qpdf -decrypt protected.pdf unprotected.pdf 
D: \> 



^X*^^ J D Example - overlapped %2> X \ 



<- -> G D file:///D:/Example%20-%20overlapped%20%26%20pratected%20text.pO. 



Example - overlapped & protected text 



CONFIDENTIAL 



0 rn b 



151 



1 . open in chrome 

2. print 



~Z- ( D Example - overlapped %k> X y 



[=1 



<- CD file:///D:/Example%20-%20overlapped%20%26%20pratected%20terf.pQ h 



Print 

Total: 1 page 



Save Cancel 



Destination 

Save as PDF 



Change., 



Pages ® All 

O 



e.g. 1-5, 8, 11-13 



Print using system dialog... (Ctrl +- Shift + P] 



Example - overlapped & protected text 



CONFIDENTIAL 



1 . change printer as "Save as PDF" 

2. Save 



^ chrome-printed [overlapped Si protected tent), pelf - Adobe Reader 



View Window Help 



Undo 
Redo 



Ctrl + Z 
Shift+Ctrl + Z 



Cut 
Copy 
Paste 
Delete 



Ctrl + X 
Ctrl+C 
Ctrl + V 



Tools 



Sign 



Comment 



*d & protected text 



Select All 



Ctrl+A 



Deselect All 


Shift-t-Ctrl-t-A 


Copy File to Clipboard 


Take a Snapshot 


Check Spelling 




Look Up Selected Word.. 




Q Find 


Ctrl+F 


Advanced Search 


Shift+Ctrl+F 


Protection 


► 


Analysis 


► 




Accessibility 




Preferences,.. 


Ctrl + K 



NTIAL 



final document looks identical 
not (SECURED) anymore 



Copy/paste corruption 

• sometimes, text can be copied, 
but it comes as corrupted 

• it's not protection, just incompatibility 

— ► try with another reader 

• it could be abused 

o but it's not easy to implement 

o and it's still easy to recover content 
(it's just a substitution cipher) 



fne polawisvl.pdf - SumatraPDF 
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J Untitled - Notepad 



HI £3 



File Edit Format View Help 



POLflWIS 

flrkadlusz Wysokinski, Ireneusz Sikora 

15-03-2014 

Cipher name: 

POLRWIS 

Version number: 
lrl 

Desi gners: 

flrkadlusz Wysokinski, Ireneusz Slkora 
Submi tters: 

flrkadiusz Wysokinski, Ireneusz Sikora 
Contact email address for submitters: 
pol aui s@sedkomp. com. pi 
Date of the document: 
March 15, 2014 
1 



Q polawisvl.pdf 

e 



\ 



Isl £3 



D file:///D:/polawisvl.pdf 
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POLAWIS 

|.Vii.:-Ji| i 

C^lllCT limine: 11 C. rtVIS 
Vesical iliuilLct : v I 

SiAmnHa*: Ajfcui'i-/ '■'■;<■. ■ 

f'r.iHnr-l ruinii nrkkfT; J ..- *ii IvunHrrs ■ r.. .':nr is.-iL. .1- .r. 



J) Untitled - Notepad 



\b\ S3 



File Edit Format View Help 
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copy/paste weirdness 



Ok, a last one 

is it hopeless? 
try this one... 



Example: overlapped image 




Failure? 



the secret behind the shape is a picture: 
— ► it's not copied as text by standard software 
(common softwares don't copy pictures) 





J] Untitled - Notepad lj 




File Edit Format View 




Example: overlapped image 
SECRET 




4 









Does it means we're safe? 

No: 

the image is still present in the PDF document. 
— ► it's trivial to extract it with a standard tool 

Example: 

use PDFImages (ormutool) 



D: \>pdf images -f 32 -1 32 "PDF Secrets.pdf" 
D: \> 





D:\>mutool extrac 


:t "PDF 


Secrets . pdf " 


extracting image 


img-0 0 


15 .png 


extracting image 


img-0 0 


1 6 . png 



File Edit Options Encoding Help 




secret 

mage 



4/4 


72S h 396 h 24 BPP 


Portable Pixelmap 


p46% 


& Browse,: 





extracting our secret image directly from the file 



Conclusion 

on Part I / III 



text can be copied 
images can be extracted 



the "Select AH" trick often works, 

but not always 



even if "Select All" 
secrets may still 



does not work, 
be recovered 



but there are 
more advanced tricks! 



need to study PDF internals 



PDF 101 

basics of the PDF file format 



Part II /III 



ssPDF-l . : 

1 0 obj 

<< 

/Pages 2 9 R 

>> 

endobj 

2 0 obj 

<c 

/Type /Pages 
/Count 1 
/Kids [3 8 R] 

>> 

endobj 

3 0 obj 

/Type /Page 
/Contents 4 8 R 
/Parent £H 
/Resources 
/Font << 
/Fl << 
/Type /Font 
/Subtype /Typel 
/BaseFont /Fir i al 



FILE 



« /Length 47 >> 

stream 
BT 

/Fl 110 
Tf 

16 499 Td. 

(Hel lo World! )Tj 

ET 

endstream 
endobj 



i . - 

e 5 



8909009947 990B8 n 



trailer 

(< 

/Root 1 0 R 



startxref 

416 

XXEOF 



HEADER 



BODY 



/Pages 2 8 R 

» 

endoOj 

' 2 8 obj 
<< 

/Type /Pages 
/Count 1 
/Kids [3 B R] 

>> 

endobj 



/Type /Page 
/Contents I 1 I 
/Parent 2 B R 
/Resources << 
/Font « 
/Fl << 

/Type /Font 
/Subtype /Typel 
/BaseFont /Rrtal 

>> 



endobj 

' 4 e obj 

«. /Length 47 >> 
■stream 

BT 

/fi lie 

Tf 

IB 4BB Td 
(Hello World! )Tj 

■ET 

endstream 
endobj 



BEGIN TEXT 

FONT F1 (ARIAL! SET TO SIZE 110 

SELECT THIS FONT 

MOVE TO COORDflATE 10. 400 

OUTPUT TEXT 'HELLO WORLD' 
END TEXT 



XREF"" 
TABLE 



xrnf 
0 5 



000000BB47 

BaaasBBi 1 1 bbbbb n 

0000008313 BBBBB n 



CROSS REFERENCES 

5 OBJECTS. STARTING AT INDEX 0 

(ST A/CARD FIRST EMPTY OBJECT 0 

OFFSET TO OBJECT 1, REV 0 

TO OBJECT I 

3... 

4 



TRAILER 



/Rent 



416 

%KE0F 



BASICS 



PDF IS TEXT BASED. WITH BINARY STREAMS 

TYPES 
0: STRING 

EX: (Hello World! ) 
/NAME (IDEflTFIEESI 

EX: /Count 1 
—-. DICTIONARY 

EX:«/t«yl valjil /keyS valuie>> 
[]■ ARRAY 

EX: [81234] 

OBJECT REFERENCES 

CONTENT IS STORED H OBJECT 

MOST COrtTEBT CArl BE INLHED OR REFERENCED in A SEPARATE OBJECT 



/Keyl value IS EQUIVALENT TO /Keyl 3 8 R 
(...] 

3 8 obj 

value 

endobj 



BINARY STREAMS 

WARY STREAM ARE STORED in SEPARATE OBJECTS LKE THIS: 

<object riumber> <object revision* obj 

<C -STREAM METADATA- >> 

stream ^ — ^#ut"JtEiiuow£3wisv«'nTm 

•STREAM CONTENT- 

endstreau 

endobj 



TRIVIA 



THE PDF WAS ERST SPECIFIED BY ADOBE SYSTEMS 111 m 
INITIAL VERSIONS OF ADOBE ACROBAT WERE NOT FREE 



FILE STRUCTURE 

HEAD OF THE FILE 

THE *POF-' SIGNATURE IDENTFIES THE FORMAT 
AND REQUIRED VER30N 

XREF 

Knef 

•STARTIflG OBJECT- 'OBJECT COUNT. 
FOLLOWED BY XREF ENTRES 
F [OBJECT IN USE] 
'OFFSET* •6ENERATI0W- n 
ELSE 

■NEXTJ*EE_OBJECT:10. 'GENERATIONS 1 f 



END OF THE FILE 

startxref 

•XREF OFFSET IN DECODED STREAM- 
IIEDF 



PARSIflG 



THE HEADER *PDF- 1 . ? SKrlATURE IS CHECKED TO IDENTIFY THE FLE FORMAT 
THE XREF IS LOCATED VIA THE startxref OFFSET 
THE xref TABLE GIVES OFFSET OF EACH OBJECT 
THE trailer IS PARSED 

EACH OBJECT REFERENCE IS FOLLOWED. BUILDING THE DOCUMENT 
PAGES ARE CREATED, TEXT IS RENDERED 



^ iiinple.pdt ■ fldobe Header 



File Edrt Vinv VWncW 



LTJ"II 



•IIZ 



Hello World! 



PDF 101 an Adobe document walkthrough 



My poster on the PDF format (free to print, reuse...) http://pics.corkami.com 

to order a print: http://prints.corkami.com 



A simple example 

helloworld.pdf 



reminder: this is simplified, PDF is actually much more complex 



f~Z 

TO simple.pdf - Adobe Reader 






File Edit View Window Help X 


1 / 1 22% ▼ 


** 


- Tc 


• 


Hello World! 


IT 


1 



StPDF-l. 1 
%aal6 



1 0 obj 

<< /Pages 2 0 R >> 
endobj 



2 0 obj 

<< /Kids [3 0 R] /Count 1 /Type /Pages >> 
endobj 

3 0 obj 

<< /Parent 2 0 R /MediaBox [0 0 612 792] 

/Resources << /Font << /Fl << 

/BaseFont /flrial /Subtype /Typel /Type /Font>> 

>> >> /Contents 4 0 R /Type /Page >> 

endobj 

4 0 obj 

<< /Filter /FlateDecode /Length 57 >> 
stream 

XES 

aRPOu3T044(Ml 2 B0€„ i ,%BH 

-a' §""iniiz_""e ©am* AiangfliuMMi \ oca* 

endstream 
endobj 



xref 
0 5 

0000000000 65535 f 
0000000016 00000 n 
0000000051 00000 n 
0000000111 00000 n 
0000000283 00000 n 



trailer << /Root 1 0 R /Size 5 >> 



startxref 

414 

%%E0F 



(text) 

binary stream 

(text) 



StPDF-l. 1 
%aal6 

1 0 obj 

<< /Pages 2 0 R >> 
endobj 

2 0 obj 

<< /Kids [3 0 R] /Count 1 /Type /Pages >> 
endobj 

3 0 obj 

<< /Parent 2 0 R /HediaBox [0 0 612 792] 

/Resources << /Font << /Fl << 

/BaseFont /flrial /Subtype /Typel /Type /Font>> 

>> >> /Contents 4 0 R /Type /Page >> 

endobj 

4 0 obj 

<< /Filter /FlateDecode /Length 57 >> 
stream 

XES 

aRPOu3T044(Ml 2 B0€„ i ,%BH 

-a' s""~mBz_""t maa' hasmftsmmi \ &m* 

endstream 
endobj 

xref 
0 5 

0000000000 65535 f 
0000000016 00000 n 
0000000051 00000 n 
0000000111 00000 n 
0000000283 00000 n 

trailer << /Root 1 0 R /Size 5 >> 

startxref 

414 

%%E0F 



A PDF file is 



• text-based 

o white-space tolerant 

• with binary streams 

— ► it can be explored with a decent text editor 



if you need one, try Notepad++ 




http://notepad-plus-plus.org/ 



Recommended environment 



• text editor 

• Sumatra 

o single-file viewer 
o updates on the fly 

• a tool to decompress streams 

o (explanations later) 

• check mistakes with qpdf --check or pdf info 



19 hw-uncompressed.pdf j 



SPDF-1. 1 




saalfi 




1 B obj 




<< /Pages Z Z 


R yy 


endobj 




2 8 obj 




<< /Kids [3 0 


R] /Type /Pages /Count 1 >> 


endobj 




3 0 obj 




<< /Parent 2 


B R /MediaBox [0 0 612 792] 


/Resources << 


/Font << /Fl << 


/BaseFont /flrial /Subtype /Type 1 /Type /Font>> 


>> >> /Contents 4 0 R /Type /Page >> 


endobj 




4 0 obj 




<< /Length 53 


>> 


stream 




ET 




/Fl 116 




Tf 




10 400 Td 




(Eye World! 


) Tj 



ET 






endstream 






endobj 






xref 






0 5 






0000000000 


65535 


f 


0000000016 


00000 


n 


0000000051 


00000 


n 


0000000109 


00000 


n 


0000000281 


00000 


n 


trai 1 er << 


/Root 


1 0 R /Size 5 >> 


startxref 






384 






ssEOF 







f™ hw-uncompressed.pdf - SumatraPDF 



File View Go To Zoom Favorites Settings Help 



Page: 



1 n + + 



Find: 



Bye World! 



editing and viewing the changes on the fly 



A PDF structure 



1. header 

o signature 

2. body 

o objects 

3. cross-reference table 

4. trailer 

5. xref pointer 

6. end of file signature 



Signature 



1. PDF signature 

o %PDF-1.0 - %PDF-1.7 

2. charset identifier 

o not required 

o tells tools it's not ASCII 

o 4 non-ASCII chars in a 
comment 



sePDF-l. 1 
%aal6 



1 0 obj 

<< /Pages 2 0 R >> 
endobj 

2 0 obj 

<< /Kids [3 0 R] /Count 1 /Type /Pages >> 
endobj 

3 0 obj 

<< /Parent 2 0 R /HediaBox [0 0 612 792] 

/Resources << /Font << /Fl << 

/BaseFont /flrial /Subtype /Typel /Type /Font>> 

>> >> /Contents 4 0 R /Type /Page >> 

endobj 

4 0 obj 

<< /Filter /FlateDecode /Length 57 >> 
stream 

XES 

aRPOu3T044(Ml 2 B0€„ i ,%BH 

-a' s""~Miz_""t §1110' AaBIEfllMMMI ! 0CH* 

endstream 
endobj 

xref 
0 5 

0000000000 65535 f 
0000000016 00000 n 
0000000051 00000 n 
0000000111 00000 n 
0000000283 00000 n 

trailer << /Root 1 0 R /Size 5 >> 

startxref 

414 

%%E0F 



StPDF-l. 1 
%aal6 



Body 

made of objects 

• <number> <generation> obj 
<content> 
endobj 



1 0 obj 

<< /Pages 2 0 R >> 
endobj 

2 0 obj 

<< /Kids [3 0 R] /Count 1 /Type /Pages >> 
endobj 

3 0 obj 

<< /Parent 2 0 R /HediaBox [0 0 612 792] 

/Resources << /Font << /Fl << 

/BaseFont /flrial /Subtype /Typel /Type /Font>> 

>> >> /Contents 4 0 R /Type /Page >> 

endobj 

4 0 obj 

<< /Filter /FlateDecode /Length 57 >> 
stream 

XES 

aRPOu3T044(Ml 2 B0€„ i ,%BH 

-a' s""~mBz_""t maa' hasmftsmmi \ &m* 

endstream 
endobj 

xref 
0 5 

0000000000 65535 f 
0000000016 00000 n 
0000000051 00000 n 
0000000111 00000 n 
0000000283 00000 n 

trailer << /Root 1 0 R /Size 5 >> 

startxref 

414 

%%E0F 



Xref 



• table 

• offsets of 

xref 
0 5 

0000000000 65535 f 
0000000016 00000 n 
0000000051 00000 n 
0000000111 00000 n 
0000000283 00000 n 



each object 

5 objects, starting 
obj #0: always null 
obj #1: offset 16 
obj : offset 51 



• each line = 20 chars 

o space before CR 



StPDF-l. 1 
%aal6 

1 0 obj 

<< /Pages 2 0 R >> 
endobj 

2 0 obj 

<< /Kids [3 0 R] /Count 1 /Type /Pages >> 
endobj 

3 0 obj 

<< /Parent 2 0 R /HediaBox [0 0 612 792] 

/Resources << /Font << /Fl << 

/BaseFont /flrial /Subtype /Typel /Type /Font>> 

>> >> /Contents 4 0 R /Type /Page >> 

endobj 

4 0 obj 

<< /Filter /FlateDecode /Length 57 >> 
stream 

XES 

aRPOu3T044(Ml 2 B0€„ i ,%BH 

-a' s""~mBz_""t maa' hasmftsmmi \ &m* 

endstream 
endobj 

xref 
0 5 

0000000000 65535 f 
0000000016 00000 n 
0000000051 00000 n 
0000000111 00000 n 
0000000283 00000 n 

trailer << /Root 1 0 R /Size 5 >> 

startxref 

414 

%%E0F 



Trailer 1/2 

• structure 

a. "trailer" 

b. object-like content 

• defines the "root" object 

o /Size = #(xref elements) 



StPDF-l. 1 
%aal6 

1 0 obj 

<< /Pages 2 0 R >> 
endobj 

2 0 obj 

<< /Kids [3 0 R] /Count 1 /Type /Pages >> 
endobj 

3 0 obj 

<< /Parent 2 0 R /HediaBox [0 0 612 792] 

/Resources << /Font << /Fl << 

/BaseFont /flrial /Subtype /Typel /Type /Font>> 

>> >> /Contents 4 0 R /Type /Page >> 

endobj 

4 0 obj 

<< /Filter /FlateDecode /Length 57 >> 
stream 

XES 

aRPOu3T044(Ml 2 B0€„ i ,%BH 

-a' s""~mBz_""t maa' hasmftsmmi \ &m* 

endstream 
endobj 

xref 
0 5 

0000000000 65535 f 

0000000016 00000 n 

0000000051 00000 n 

0000000111 00000 n 

0000000283 00000 n 

Itrailer << /Root 1 0 R /Size 5 >>| 

startxref 

414 

%%E0F 



Trailer 2/2 



. pointer to xref 

a. "startxref" 

b. offset to xref 
■ (decimal) 

. End Of File marker 

a. %%EOF 



StPDF-l. 1 
%aal6 

1 0 obj 

<< /Pages 2 0 R >> 
endobj 

2 0 obj 

<< /Kids [3 0 R] /Count 1 /Type /Pages >> 
endobj 

3 0 obj 

<< /Parent 2 0 R /HediaBox [0 0 612 792] 

/Resources << /Font << /Fl << 

/BaseFont /flrial /Subtype /Typel /Type /Font>> 

>> >> /Contents 4 0 R /Type /Page >> 

endobj 

4 0 obj 

<< /Filter /FlateDecode /Length 57 >> 
stream 

XES 

aRPOu3T044(Ml 2 B0€„ i ,%BH 

-a' s""~mBz_""t maa' hasmftsmmi \ &m* 

endstream 
endobj 

xref 
0 5 

0000000000 65535 f 
0000000016 00000 n 
0000000051 00000 n 
0000000111 00000 n 
0000000283 00000 n 

trailer << /Root 1 0 R /Size 5 >> 



startxref 

414 

%%E0F 



Basic types 

names, strings, dictionaries... 



Literals 



• (string) 

• <hex> 

• %comment until line return 

• some others, less-used types 
(PDF is quite f*cked up) 



SPDF-1. 1 
saalO 



1 0 obj 

<< /Pages 2 0 R >> 
endobj 



2 B obj 

<< /Kids [3 0 R] /Type /Pages /Count 1 >> 
endobj 

3 0 obj 

<< /Parent 2 0 R /MediaBox [0 0 612 792] 

/Resources << /Font << /Fl << 

/BaseFont /firial /Subtype /Type 1 /Type /Font>> 

>> >> /Contents 4 0 R /Type /Page >J 

endobj 



4 0 obj 

<< /Length 53 >> 

stream 

BT 

/Fl 110 
Tf 

IS 400 Td 

(Hello World! ) Tj 

E 

endstream 

endobj 



xref 
0 5 

0000000000 65535 f 

0000000016 00000 n 

0000000051 00000 n 

0000000109 00000 n 

0000000281 00000 n 

trailer << /Root 1 0 R /Size 5 >> 

startxref 

3B4 

sssEOF 



equivalent files 



SPDF-1. 1 
saalO 



1 0 obj 

<< /Pages 2 0 R >> 
endobj 



2 0 obj 

<< /Kids [3 0 R] /Type /Pages /Count 1 >> 
endobj 

3 0 obj 

<< /Parent 2 0 R /MediaBox [0 0 612 792] 

/Resources << /Font << /Fl << 

/BaseFont /firial /Subtype /Type 1 /Type /Font>> 

>> >> /Contents 4 0 R /Type /Page >J 

endobj 



4 0 obj 

<< /Length 75 >> 

stream 

BT 

/Fl 110 
Tf 

10 400 Td 

<48 65 6C 6C 6F 20 57 6F 72 6C 64 21> Tj 

n 

endstream 
endobj 



xref 
0 5 

0000000000 65535 f 

0000000016 00000 n 

0000000051 00000 n 

0000000109 00000 n 

0000000281 00000 n 



trailer << /Root 1 0 R /Size 5 >> 



startxref 

407 

sssEOF 



Object reference 



points 

• <object> <generation> R 
with 

• the actual contents of the 
object 

some object CANT be inlined 

<generation> is very rarely non-null 



StPDF-l. 1 
%aal6 



1 0 obj 

<< /Pages [2~9 R | >> 
endobj 

2 0 obj 

<< /Kids [3 9 R] /Count 1 /Type /Pages >> 



endobj 

3 0 obj 

<< /Parent 2 0 R /HediaBox [0 0 612 792] 

/Resources << /Font << /Fl << 

/BaseFont /flrial /Subtype /Typel /Type /Font>> 

>> >> /Contents 4 0 R /Type /Page >> 

endobj 

4 0 obj 

<< /Filter /FlateDecode /Length 57 >> 
stream 

XES 

aRPOu3T044(Ml 2 B0€„ i ,%BH 

-a' s"" _ niiiz_""( ©am 1 AiangfliMMMi \ oca* 

endstream 
endobj 

xref 
0 5 

0000000000 65535 f 
0000000016 00000 n 
0000000051 00000 n 
0000000111 00000 n 
0000000283 00000 n 

trailer << /Root 1 0 R /Size 5 >> 



startxref 

414 

%%E0F 



Object reference - example 1 



57 354 0 R 



354 0 obj 
57 

endobj 



2 equivalent examples via object reference 



Object reference syntax 



it's odd, but critical to understand 

• 3 0 1 => 3 elements (3 numbers): 

a. 3 

b. 0 

c. 1 

• 3 0 R => 1 element: 

a. reference to "3 0" 

■ object 3 

■ generation 0 

Other PDF syntax rules follow common-sense 



Name objects 



• "reserved keywords" 

o like symbols in Ruby 

• starts with / 

o /Pages , /Kids ... 

• case sensitive 

o CamelCase by default 

o undefined names are ignored 

=>/pages != /Pages 

(useful to disable tags) 



StPDF-l. 1 
%aal6 



1 8 obj 

<< | /Pages | 2 0 R >> 
endobj 

2 8 obj 

<< | /Kids |[3 0 R] /Count 1 /Type /Pages >> 



endobj 

3 0 obj 

<< /Parent 2 0 R /HediaBox [0 0 612 792] 

/Resources << /Font << /Fl << 

/BaseFont /flrial /Subtype /Typel /Type /Font>> 

>> >> /Contents 4 0 R /Type /Page >> 

endobj 

4 0 obj 

<< /Filter /FlateDecode /Length 57 >> 
stream 

XES 

aRPOu3T044(Ml 2 B0€„ i ,%BH 

-a' s""~mBz_""t maa' hasmftsmmi \ &m* 

endstream 
endobj 

xref 
0 5 

0000000000 65535 f 
0000000016 00000 n 
0000000051 00000 n 
0000000111 00000 n 
0000000283 00000 n 

trailer << /Root 1 0 R /Size 5 >> 



startxref 

414 

%%E0F 



StPDF-l. 1 
%aal6 



Array 

Syntax 

• [ <values>* ] 



1 0 obj 

<< /Pages 2 0 R >> 
endobj 

2 0 obj 

<< /Kids |[3 0 R]j /Count 1 /Type /Pages >> 
endobj 

3 0 obj 

<< /Parent 2 0 R /MediaBox [[0 0 612 792]~ 
/Resources << /Font << /Fl << 

/BaseFont /flrial /Subtype /Typel /Type /Font>> 

>> >> /Contents 4 0 R /Type /Page >> 

endobj 



Examples: 

• [3 0 R] = 1 value 

a. "3 0 R" 

• [0 0 612 792] = 4 values 

a. 0 

b. 0 

c. 612 

d. 792 



4 0 obj 

<< /Filter /FlateDecode /Length 57 >> 
stream 

XES 

aRPOu3T044(Ml 2 B0€„ i ,%BH 

-a' s""~mBz_""t maa' hasmftsmmi \ &m* 

endstream 
endobj 

xref 
0 5 

0000000000 65535 f 
0000000016 00000 n 
0000000051 00000 n 
0000000111 00000 n 
0000000283 00000 n 

trailer << /Root 1 0 R /Size 5 >> 

startxref 

414 

%%E0F 



Dictionaries 

Syntax: 

• « [<name> <value>]* » 

Object 1 sets: 

1 . /Pages to "2 0 R" 

Object 2 sets: 

1 . /Kids to "[3 0 R]" 

2. /Count to "1" 

3. /Type to /Pages 



%PDF-1. 1 
%aaT8 

1 8 obj 

<< /Pages 2 8 R >>| 
endobj 

2 8 obj 

<< /Kids [3 8 R] /Count 1 /Type /Pages >>| 
endobj 

3 8 obj 

<< /Parent 2 8 R /HediaBox [8 8 612 792] 

/Resources << /Font << /Fl << 

/BaseFont /flrial /Subtype /Typel /Type /Font>> 

>> >> /Contents 4 8 R /Type /Page >> 

endobj 

4 8 obj 

<< /Filter /FlateDecode /Length 57 >> 
stream 

XES 

aRP0u3T844(Ml 2 B0€„ i ,%BH 

-a' s""~[nez_""(r©iiie' AiangfliMMMi \ ecn* 

endstream 
endobj 

xref 
8 5 

8888888888 65535 f 
8888888816 88888 n 
8888888851 88888 n 
8888888111 88888 n 
8888888283 88888 n 

trailer << /Root 1 8 R /Size 5 >> 

startxref 

414 

%%E8F 



Object reference 



/Pages 2 0 R 
is "equivalent" to 
/Pages << 

/Kids [3 0 R] 

/Count 1 

/Type /Pages 



and then "3 0 R" is 



example 2 



1 8 obj 

<< /Pages |2 8~ 
endobj 



>> 



2 8 obj 

<< /Kids [3 0 R] /Count 1 
endobj 



too 



Binary streams 

parameters, filters... 



Streams 



syntax: 

1 . usual object declaration 

2. parameters dictionary 

3. stream 

+ return character 

4. stream data 

5. endstream 

+ return character 

6. usual endobj 

stream data is not interpreted 
(at object level) 



StPDF-l. 1 
%aal6 

1 0 obj 

<< /Pages 2 0 R >> 
endobj 

2 0 obj 

<< /Kids [3 0 R] /Count 1 /Type /Pages >> 
endobj 

3 0 obj 

<< /Parent 2 0 R /HediaBox [0 0 612 792] 

/Resources << /Font << /Fl << 

/BaseFont /flrial /Subtype /Typel /Type /Font>> 

>> >> /Contents 4 0 R /Type /Page >> 

endobj 



4 0 obj 

<< /Filter /FlateDecode /Length 57 >> 
stream 

XES 

aRPOu3T044(Ml 2 B0€„ i ,%BH 

-a' s""~Miz_""t §fflO' AlBIEfllMMMI ! 0CB* 

endstream 
endobj 



xref 
0 5 

0000000000 65535 f 
0000000016 00000 n 
0000000051 00000 n 
0000000111 00000 n 
0000000283 00000 n 

trailer << /Root 1 0 R /Size 5 >> 

startxref 

414 

%%E0F 



Example 



object 4 

• stream parameters 

o /Filter = /FlateDecode 
o /Length = 57 

• stream content (binary) 

xoesaRPOw3T044 2 BO€„j □,%<>□ DBH 
□-a's"" _ z_""^"©'AaA !0* 



4 0 obj 

<< /Filter /FlateDecode /Length 57 >> 
stream 

xces 

aRP0u3T044dI 



-a's"" 



i z B0€„i,%BH 
HOz_""(t ©HO' fiaHm]Fl(Mi]GI!H! eaix 

endstream 
endobj 



Binary streams 

• can be stored with different encodings 

o /Filter 

o encodings can be cascaded 

• content is decoded 
• after each filter 

only the final data matters 



Streams don't enforce 

encodings 

as long as the result is correct 
once decoded by the filters 



<< /Length 53 >> 



<< /Filter /FlateDecode 



stream 

BT 

/Fl 110 Tf 

10 400 Td 

(Hello World!) Tj 

ET 

endstream 



/Length 57 >> 
stream 

xces 

aRPDw3T044 2 BO€„j,%oBH 
-a c s CCfC ~ AaA !0 

X 

endstream 



these 2 streams are equivalent, 
just using a different encoding 



<< /Filter 

[/ASCIIHexDecode 
/FlateDecode] 

/Length 170 >> 

stream 

78 9C 73 0A El 52 50 D0 77 33 54 30 34 
34 00 B2 42 D2 80 84 Al 81 82 89 81 81 
42 48 0A 90 AD El 91 9A 93 93 AF 10 9E 
5F 94 93 A2 A8 A9 10 92 C5 E5 1A C2 05 
00 21 30 0B D7 

endstream 



<< /Filter /FlateDecode 

/Length 57 >> 
stream 

xces 

aRPOw3T044 2 B0€„j,% 0 BH 
-a f s ffff "zJ"T'©'AaA !0 

X 

endstream 



/ASCIIHexDecode will 
decode ASCII Hex to binary 



Main filters 



• <none>: direct raw binary in the file 

• /FlateDecode : ZIP's deflate decompression 

— ► smaller 

• /ASCIIHexDecode: turns hex into binary 

o 41 0A => "A\n" 

— ► easy text editing (but binary is very common) 

mutool has a specific option for that 



Other filters 



Images 

• /DCTDecode to store JPEG files directly 

o not just the data, even the header! 

• JPEG2000, Fax 

Encryption 

• Crypt 

o RC4orAES 



Let's put it all together 

how is the file actually parsed? 



Parsing 1/7 

1 . Signature is checked 



%PDF-1.1 
%aaTO 



1 Oobj 

« /Pages 2 0 R » 
endobj 

2 Oobj 

« /Kids [3 0 R] /Type /Pages /Count 1 » 
endobj 

3 0 obj 

« /Parent 2 0 R /MediaBox [0 0 612 792] 

/Resources « /Font « /F1 « 

/BaseFont /Arial /Subtype /Typel /Type /Font» 

» » /Contents 4 0 R /Type /Page » 

endobj 

4 0 obj 

« /Length 53 » 

stream 

BT 

/F1 110 Tf 
10 400 Td 
(Hello World !) Tj 
ET 

endstream 
endobj 

xref 
0 5 

0000000000 65535 f 
0000000016 00000 n 
0000000051 00000 n 
0000000109 00000 n 
0000000281 00000 n 

trailer « /Root 1 0 R /Size 5 » 

startxref 
384 

%%EOF 



Parsing 2/7 

2. %%EOF is located 



%PDF-1.1 
%aa'['6 



1 Oobj 

« /Pages 2 0 R » 
endobj 

2 Oobj 

« /Kids [3 0 R] /Type /Pages /Count 1 » 
endobj 

3 0 obj 

« /Parent 2 0 R /MediaBox [0 0 612 792] 

/Resources « /Font « /F1 « 

/BaseFont /Arial /Subtype /Typel /Type /Font» 

» » /Contents 4 0 R /Type /Page » 

endobj 

4 0 obj 

« /Length 53 » 

stream 

BT 

/F1 110 Tf 
10 400 Td 
(Hello World !) Tj 
ET 

endstream 
endobj 

xref 
0 5 

0000000000 65535 f 
0000000016 00000 n 
0000000051 00000 n 
0000000109 00000 n 
0000000281 00000 n 

trailer « /Root 1 0 R /Size 5 » 

startxref 



%%EOF 



%PDF-1.1 

%aa!6 



Parsing 3/7 



3. xref is located via startxref « 0 £l n t20R,MediaBox[ooei2792] 

/Resources « /Font « /F1 « 

/BaseFont /Arial /Subtype /Typel /Type /Font» 

» » /Contents 4 0 R /Type /Page » 

endobj 

4 0 obj 

« /Length 53 » 

stream 

BT 

/F1 110 Tf 
10 400 Td 
(Hello World !) Tj 
ET 

endstream 
endobj 

xref 

0 5 

0000000000 
0000000016 
0000000051 
0000000109 
0000000281 



1 Oobj 

« /Pages 2 0 R » 
endobj 

2 Oobj 

« /Kids [3 0 R] /Type /Pages /Count 1 » 
endobj 



65535 f 
00000 n 
00000 n 
00000 n 
00000 n 



trailer « /Root 1 0 R /Size 5 » 



startxref 
384 

%%b(JI- 



Parsing 4/7 



4. xref gives offsets 
of each objects 



%PDF-1.1 
%aalO 



1 0 obj 

« /Pages 2 0 R » 
endobj 

2 0 obj 

« /Kids [3 0 R] /Type /Pages /Count 1 » 
endobj 

>- 3 0 obj 

« /Parent 2 0 R /MediaBox [0 0 612 792] 

/Resources « /Font « /F1 « 

/BaseFont /Arial /Subtype /Typel /Type /Font» 

» » /Contents 4 0 R /Type /Page » 

endobj 

4 0 obj 

« /Length 53 » 

stream 

BT 

/F1 110 Tf 
10 400 Td 
(Hello World !) Tj 
ET 

endstream 
endobj 



xref 
0 5 

0000000000 65535 f 
0000000016 00000 n 
0000000051 00000 n 
0000000109 00000 n 
0000000281 00000 n 



trailer « /Root 1 0 R /Size 5 » 

startxref 
384 

%%EOF 



Parsing 5/7 

5. trailer is parsed 
— ► gives /Root object 



%PDF-1.1 

%aaT6 

1 Oobj 

« /Pages 2 0 R » 
endobj 

2 Oobj 

« /Kids [3 0 R] /Type /Pages /Count 1 » 
endobj 

3 0 obj 

« /Parent 2 0 R /MediaBox [0 0 612 792] 

/Resources « /Font « /F1 « 

/BaseFont /Arial /Subtype /Typel /Type /Font» 

» » /Contents 4 0 R /Type /Page » 

endobj 

4 0 obj 

« /Length 53 » 

stream 

BT 

/F1 110 Tf 
10 400 Td 
(Hello World !) Tj 
ET 

endstream 
endobj 

xref 
0 5 

0000000000 65535 f 
0000000016 00000 n 
0000000051 00000 n 
0000000109 00000 n 
0000000281 00000 n 



trailer « /Root 1 0 R /Size 5 » 



startxref 
384 

%%EOF 



%PDF-1.1 

%aa!6 



Parsing 6/7 

6. objects are parsed 

a. /Root object contains /Pages 

b. /Pages contains page array 

■ /Kids 

c. each /Page has: 

■ size: /MediaBox 

■ /Contents 

• as stream object 

■ /Resources 

• define /Font dictionary 



1 Oobj 

« /Pages 2 0 R » 
endobj 

2 Oobj 

« /Kids [3 0 R] /Type /Pages /Count 1 » 
endobj 

3 0 obj 

« /Parent 2 0 R /MediaBox [0 0 612 792] 

/Resources « /Font « /F1 « 

/BaseFont /Arial /Subtype /Typel /Type /Font» 

» » /Contents 4 0 R /Type /Page » 

endobj 



4 0 obj 

« /Length 53 » 

stream 

BT 

/F1 110 Tf 
10 400 Td 
(Hello World !) Tj 
ET 

endstream 
endobj 

xref 
0 5 

0000000000 65535 f 
0000000016 00000 n 
0000000051 00000 n 
0000000109 00000 n 
0000000281 00000 n 

trailer « /Root 1 0 R /Size 5 » 

startxref 
384 

%%EOF 



Parsing 7/7 



7. the page is rendered 

a. bt BeginText 

b. <name> <size> Tf select font 

c. <x> <y> Td move cursor 

d. <string> Tj display string 

e. et EndText 




BT 



/Fl 110 Tf 

10 400 Td 

(Hello World!) Tj 



ET 



%PDF-1.1 

%aaT6 

1 Oobj 

« /Pages 2 0 R » 
endobj 

2 Oobj 

« /Kids [3 0 R] /Type /Pages /Count 1 » 
endobj 

3 0 obj 

« /Parent 2 0 R /MediaBox [0 0 612 792] 

/Resources « /Font « /F1 « 

/BaseFont /Arial /Subtype /Typel /Type /Font» 

» » /Contents 4 0 R /Type /Page » 

endobj 

4 0 obj 

« /Length 53 » 

stream 

BT 

/F1 110 Tf 

10 400 Td 

(Hello World !) Tj 
ET 

enastream 
endobj 

xref 
0 5 

0000000000 65535 f 
0000000016 00000 n 
0000000051 00000 n 
0000000109 00000 n 
0000000281 00000 n 

trailer « /Root 1 0 R /Size 5 » 

startxref 
384 

%%EOF 



In practice 



• that was the 'strict' minimum 

• a typical PDF embeds more information 

• fonts 

• fonts encoding 

• metadata 



a generated Hello World typically weights >5 Kb 



In practice - in the malware world 



• most readers accept malformed files 

o many elements missing 

■ EOF, startxref, xref, /Length, endobj, endstream 

■ /MediaBox /Font 

• each reader has its own weirdness 

o see my "Schizophrens" talks and PoCs 

• so much for the so-called "standard" 



%PDF-\01 0 obj<</Kids 
[<</Parent 1 0 R/Contents 
[2 0 R]»] 

/Resources<<>>>>2 0 
obj<<>>stream\n 
BT/F1 105 Tf 0 400 Td 
(Hello Adobe !)Tj ET 
endstream\n 
endobj\n 

trailer<</Root<</Pages 1 
0 R>>>> 



^ helloworld-adobe.pdf - Adobe Reader 






1 1 e U£&J 


File Edit View Window Help 








X 






51 5% 












Tools ; S 


gn . Comment 




Hello Adobe! 



a "Hello World" for Adobe, in 1 79 bytes 



Conclusion 



we've covered the basics of: 

• file structure 

• objects relation 

• file parsing 

• page rendering 



enough to play with PDF internals! 



A technical perspective 

Part III /III 



Isn't copy/paste enough? 



• why not editing the file itself ? 

and restoring the secrets perfectly? 

want to hide something? 

• create your own methods! 



Easy PDF editing 



1 . decompress streams 

o PDFTk.qpdf 

o optional: use ASCIIHex to get an ASCII-only file 

2. open in text editor 

3. view results via Sumatra 

overwrite, or comment (don't delete) 
=> no offset to adjust 



D:\>pdftk "PDF Secrets.pdf" output uncompressed.pdf uncompress 



D:\>qpdf --qdf "PDF Secrets.pdf" uncompressed.pdf 



Reminder 



technically speaking, a PDF page is: 

1 . a stream object 

2. as the /Contents of a /Type /Page object 

3. in the /Kids array of a /Type /Pages object 

4. as the value of /Pages in root object 

5. as the value of /Root in the trailer 

and a text on the page is a simple (string) Tj 



Remove a page ? 



easy hiding 

1 . remove reference from /Kids 

2. write it back later 



ob j 
15776 
endobj 
1 

0 

obj 

<< 

/Type 

/Pages 

/Kids 

[ 

6 

0 

R 

14 

0 

R 

21 

0 

R 



]_ 


/Count 






3 






>> 






endobj 
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and private image: 
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public text 


locate the /Kids array 



ob j 






15776 






endob j 
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/Type 
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>> 






endobj 
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Edit out your page's reference 



ob j 






15776 






endobj 
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obj 






<< 






/Type 






/Pages 






/Kids 
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R 
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2 


>> 






endobj 
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public text 



and don't forget to update the pages' /Count © 
(may lead to funny results) 



Erasing a page with a tool 



tools such as PDFtk can operate on pages 



o 



D:\>pdftk "PDF Secrets.pdf" cat 1-3 5-end output no4.pdf 



but: 



they don't erase pages! 

o they extract the other pages 

the whole page is lost 



but the image contents (as objects) are still left! 
and extractable!! 



Erase overlapping element? 

• remove paint/text operators from binary stream 
Hint: 

overlapping elements might be 

at the end of the stream, 

as they were likely added last 



Operands 


Operator 


Description 




5 


Stroke the path. 




s 


Close and stroke the path. This operator shall have the same effect as the 
sequence h S. 


— 


f 


Fill the path, using the nonzero winding number rule to determine the region 
to fill (see 8.5.3.3.2, "Nonzero Winding Number Rule"). Any subpaths that 
are open shall be implicitly closed before being filled. 


— 


F 


Equivalent to f; included only for compatibility. Although PDF reader 
applications shall be able to accept this operator, PDF writer applications 
should use f instead. 




f* 


Fill the path, using the even-odd rule to determine the region to fill (see 
8.5.3.3.3, "Even-Odd Rule"). 


— 


B 


Fill and then stroke the path, using the nonzero winding number rule to 
determine the region to fill. This operator shall produce the same result as 
constructing two identical path objects, painting the first with f and the 
second with S. 

NOTE The filling and stroking portions of the operation consult 
different values of several graphics state parameters, such as 
the current colour See also 11.7.4.4, "Special Path- Painting 
Considerations". 


— 


B 1 


Fill and then stroke the path, using the even-odd rule to determine the region 
to fill. This operator shall produce the same result as B, except that the path 
Is filled as if with f* instead of f. See also 11./. 4. 4. "Special Path-Painting 
Considerations". 


— 


b 


Close, fill, and then stroke the path, using the nonzero winding number rule 
to determine the region to fill. This operator shall have the same effect as the 
sequence h B. See also 11.7.4.4, "Special Path-Painting Considerations". 


— 


b* 


Close, fill, and then stroke the path, using the even-odd rule to determine the 
region to fill. This operator shall have the same effect as the sequence h B\ 
See also 11.7.4.4, "Special Path-Painting Considerations". 


— 


n 


End the path object without filling or stroking it. This operator shall be a path- 
painting no-op, used primarily for the side effect of changing the current 
clipping path (see 8 5 4, "Clipping Path Operators"), 



paint operators 

(PDF 32000-1:2008, page 135) 



Operands 


Operator 




Description 


string 


Tj 




Show a text string. 


string 


i 




Move to the next line and show a text string. This operator shall have the 
same effect as the code 

T* 

strinq Tj 








a w a c string 






Move to the next line and show a text string, using a w as the word spacing 
and a c as the character spacing (setting the corresponding parameters in 
the text state). a w and a c shall be numbers expressed in unsealed text 
space units. This operator shall have the same effect as this code: 
a w Tw 
a c Tc 
string ' 


array 


TJ 




Show one or more text strings, allowing individual glyph positioning. Each 
element of array shall be either a string or a number. If the element is a 
string, this operator shall show the string. If it is a number, the operator 
shall adjust the text position by that amount; that is, it shall translate the 
text matrix, 7~ m , The number shall be expressed in thousandths of a unit 

of text space (see 9.4.4, "Text Space Details"). This amount shall be 
subtracted from the current horizontal or vertical coordinate, depending 
on the writing mode. In the default coordinate system, a positive 
adjustment has the effect of moving the next glyph painted either to the 
left or down by the given amount. Figure 46 shows an example of the 
effect of passing offsets to TJ. 







text showing operators 

(PDF 32000-1:2008, page 250-251) 



Example: 
manually remove 
overlapping elements 
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Example 2 






CONFIDENTIAL 











take the uncompressed PDF 

locate the /Contents stream object 

locate the S (Stroke path) 
(you can search for \nS\n) 
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erase the S 
=> no more black border 
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locate the f (path Filling) 
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=> no more gray surface 
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and the "obvious" T j after the string ( . . . ) 

Note: the letters are different, due to the font mapping 
&^C, 2^0, 1->N... 
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Example 2 

hidden via 
overlapping shape 



no more hidden elements! 



bonus: the operation can be easily automated! 
(on all pages, etc..) 



Page size tricks 

• a page isn't just a /MediaBox :( 

o PDF is not so simple! 

■ CropBox/BleedBox/TrimBox/ArtBox/... 

• What you see is /CropBox 

o Copy/Paste and (some) pdf totext respect that 

=> what is in Mediabox (but not CropBox) 
is not extracted 



<< /Kids [3 8 R] /Type / 
endobj 

3 0 obj 

<< /Parent 2 0 R 
/MediaBox [0 0 612 950] 
/CropBox [0 0 612 792] 
/Resources << /Font << A 
/BaseFont /flrial /Subtyp 
>> >> /Contents 4 0 R /T 
endobj 

4 0 obj 

<< /Length 75 >> 
stream 
BT 

/Fl 110 Tf 
10 400 Td 
(Hello World! )Tj 
70 450 Td 
(SECRET ! )Tj 

ET 

endstream 
endobj 



(H cropbox.pdf - SurnatraPDF 
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Hello World! 



<< /Kids [3 0 R] /Type / 
endobj 

0 obj 
<< /Parent 2 0 R 
/MediaBox [0 0 612 950] 
/cropBox [0 0 612 792] 
/RelSmxes << /Font << 
/BaseForTSv/flri al /Subtyp 
>> >> /Conte><£ 4 0 R /T 
endobj 

0 obj 
<< /Length 75 >> 
stream 
BT 

/Fl 110 Tf 
10 400 Td 
(Hel lo World! )Tj 
70 450 Td 
(SECRET! )Tj 

ET 

endstream 
endobj 
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SECRET! 



Hello World! 



disable /CropBox to see the full contents 




OS-X actually does a /CropBox when you copy/paste out of a PDF, 
and you can see the full original content by rotating the page. 



Hidden text 

• White color 

o l l l rg (filling's color) 

• text rendering mode 

o 3 Tr = invisible 

■ OCRs use it to store text 



enaoDj 
4 0 obj 

<< /Length 68 >> 

stream 

BT 

/Fl 110 Tf 
10 4B0 Td 
1 1 1 rg 
3 Tr 

(Hello World!) Tj 
ET 

endstream 
endobj 

xref 
5 

0000000000 65535 f 
0000000016 00000 n 
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endobj 
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4 0 obj 
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<< /Length 68 >> 
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stream 
BT 

/Fl 110 Tf 
10 400 Td 










1 


0 0 0 rg 
0 Tr 












(Hello World!) 


Tj 






Hello World! 


ET 

endstream 
endobj 








xref 
0 5 

0000000000 6553E 
0000000016 00000 


f 

n 











A more 'deniable' hiding 

altering /Kids or the page's /Contents work, 

but there is another elegant solution: 
incremental updates 



PDF incremental updates 

• not commonly used 

o required for signing 

• but still supported by readers 
the concept: 

add another set of objects, xref, trailer, . . . 
to update the objects' hierarchy 



%PDF-1.1 
%aal6 



Example 

a confidential object 

with a secret stream object 4 

to be hidden 

fiei increments I. pdf - Sumatra PDF i"! 1 ^ I^^J 

File View Go To Zoom Favorites Settings Help 



Top Secret 



1 0 obj 

<< /Pages 2 0 R >> 
endobj 

2 0 obj 

<< /Kids [3 0 R] /Type /Pages /Count 1 >> 
endobj 

3 0 obj 

<< /Parent 2 0 R /MediaBox [0 0 612 792] 

/Resources << /Font << /Fl << 

/BaseFont /Arial /Subtype /Typel /Type /Font>> 

>> >> /Contents 4 0 R /Type /Page >> 

endobj 

4 0 obj 

<< /Length 50 >> 

stream 

BT 

/Fl 120 Tf 

10 400 Td 

(Top Secret) Tj 
ET 1 



endstream 






endobj 






xref 






0 5 






0000000000 


65535 


f 


0000000016 


00000 


n 


0000000052 


00000 


n 


0000000110 


00000 


n 


0000000282 


00000 


n 


trailer << 


/Size 


5 



/Root 1 0 R >> 



startxref 

385 

%%EOF 



New /Contents 

append a new object 4 



4 0 obj 

<< /Length 52 >> 

stream 

BT 

/Fl 110 Tf 

10 400 Td 

(Hello World!) Tj 

ET 

endstream 
endobj 



Extra xref 

append a new xref 
that references it 



xref 






0 1 






0000000000 


65535 


f 


4 1 






0000000551 


00000 


n 



Extra trailer 1/2 



same /Size & /Root 

references the previous xref via /Prey 

(not the previous trailer) 



trailer << 
/Size 5 
/Root 1 0 R 
/Prev 385 

>> 



Extra trailer 2/2 

points to the new xref 



Result 

=> different content ! 

restore content by cutting after the first %%EOF 

fa incremental.pdf - SumatraPDF [ n l^ I^^J 
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Hello World! 



Incremental update to 



hide page 



use the same trick 

to override /Type /Pages 



%%EOF 

1 0 obj 

<< 

/Type /Pages 

/Kids [ 6 0 R 21 0 R] 

/Count 2 

>> 

endobj 

xref 
0 1 

0000000000 65535 f 

1 1 

0000118783 00000 n 

trailer << /Size 41 /Root 4 
0 R /Prev 117882 >> 

startxref 

118849 

%%EOF 



Actual leaks in the wild ? 

in any PDF with /Prev in the trailer: 
restore each intermediate version 
by truncating after each %%EOF 
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DS5002FP 
Secure Microprocessor Chip 
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GENERAL DESCRIPTION 

The DS5002FP secure microprocessor chip is a 
secure version of the DS5001FP 128k soft 
microprocessor chip. In addition to the memory and 
I/O enhancements of the DS50Q1FP the secure 
microprocessor chip incorporates the most 
sophisticated security features available in any 
processor. The security features of Ihe DS5002FP 
include an array of mechanisms that are designed to 
resist all levels of threat, including observation, 
analysis, and physical attack. As a result, a massive 
effort is required to obtain any information about 
memory contents. Furthermore, the "soft" nature of 
the DS5002FP allows frequent modification of the 
secure information, thereby minrmizing the value of 
any secure information obtained by such a massive 
effort 

PIN CONFIGURATION 
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FEATURES 

■ SO 51 -Compatible Microprocessor for 
Secure/Sensitive Applications 

Access 32kB, G4kB, or 12SkB of NV SRAM for 

Program and/or Data Storage 
In-System Programming Through On^Chip Serial 

Port 

Can Modify Its Own Program or Data Memory in 
the End System 

■ Firmware Security Features 
Memory Stored in Encrypted Form 
Encryption Using On-Chip 64-Bit Key 
Automatic True Random Key Generator 
Self Destruct Input (SDI) 

Optional Top Coating Prevents Mioroprobe 

(DS5002FPM) 
Improved Security Over Previous Generations 
Protects Memory Contents from Piracy 

■ Crash-P roof Operation 

Maintains All Nonvolatile Resources for Over 10 

Years in the Absence of Power 
Power- Fail Reset 

Early Warning Power-Fail Interrupt 
Watchdog Timer 

ORDERING INFORMATION 









NTERNAL 




PART 


TEMP 


RANGE 


MICRO 
PROBL 
5HELD 


PIN' 

PACKAGE 


DS5302FPM-16 


O'C to 


+7Q'C 


Yes 


80 OFP 


□sk)02fpm-16+ 


O'C to 


+70*C 


Yes 


80 OFP 


□S5002FMN-16 


-40'C 


o +85'C 


Yes 


80 QFP 


DSSW2FMN-1S+ 


-40 : C k> +35*C 


Yes 


SO OFP 



i- Denotes a Ph-fres/RohS -compliant device. 
SeSBCtor ■Cuj'de- appears at end ot data c/w*t 
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GENERAL DESCRIPTION 

The DS5002FP secure microprocessor chip is a 
secure version of the DS5001FP 128k soft 
microprocessor chip, in addition to the memory and 
I/O enhancements of me D55QQ1FP, the secure 
microprocessor chip incorporates the most 
sophisticated security features available in any 
processor. The security features of the DS5QG2FP 
include an array of mechanisms that are designed to 
resist all levels of threat, including observation, 
analysis, and physical attack. As a result, a massive 
effort is required to obtain any information about 
memory contents. Furthermore, the "soft" nature of 
the DS5Q02FP allows frequent modification of the 
secure information, thereby minimizing the value of 
any secure information obtained by such a massive 
effort 

PIN CONFIGURATION 
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QFP 



DS5002FP 
Secure Microprocessor Chip 

FEATURES 

■ 3Q 51 -Compatible Microprocessor for 
Secure/Sensitive Applications 

Access 32kB, 64kB, or 1 2GkB of NV SRAM for 

Program and/or Data Storage 
In-System Programming Through On-Chip Serial 

Port 

Can Modify Its Own Program or Data Memory in 
the End System 

■ Firmware Security Features 

Memory Stored in Encrypted Form 
Encryption Using On-Chip 64-Blt Key 
Automatic True Random Key Generator 
Self Destruct Input (SDI) 
Optional Top Coating Prevents Microprobe 

(DS50G2FPM) 
Improved Security Over Previous Generations 
Protects Memory Contents from Piracy 

■ Crash-P roof Operatl on 

Maintains All Nonvolatile Resources for Over 10 

Years in the Absence of Power 
Power- Fail Reset 

Eariy Warning Power-Fail Interrupt 
Watchdog Timer 

ORDERING INFORMATION 
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MICRO PIN- 
PROBE PACKAGE 



u C 10 '70 C 



0"C to +70"C 



D8S0na=PM-1B 0-C to +70-C 



DS9B2FPMH6 0"C to +?0'C 



Yes 



80 QFP 



□SOT2FP-1EM -40"<; la -B5"C 



DS50MFP+16N -40*C 10 +85'C 



□S50CEFMN-16 -40'C 10 *85'C 



□S5002FMN+16 -40'C 10 -?!"> C 



I tXX'Ok'i- i.' ^t: Vf V: ;:..;.,.|,' .;![.: 

SsSsctor Guide appears at and of data sheet. 
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REV- 090805 



incremental PDF found in the wild 
(removed parts, incorrect page number) 



D-S5002FP Secu re M ic ro prooessor Chip PS 5002 FP S ec ure Microprocessor Chip 



REVISION HISTORY REVISION HISTORY 



REVISION 


DESCRIPTION 




REVISION 


DESCRIPTION 


112795 


Ordinal 10 ecse. 


112795 


Original release. 


073096 


Change Voaa 3 F»e>ci1icatjon from V u - 0.5 to V u - 0,65 (PCN F62501 ), 
Updale mechanical specifications. 


073096 


Change Vceoi specification from Vu - 0.5 to Vu - 0,§5 [PCN F6250 1 ), 
Update mechanical specifications. 




'uriaiiyc v octii iiuipi vcc u-o u> vcc u.-jj 


ill yyb 


rhqnnp W Fry-\n- W.. . - n "3 1n V - n 1^ 




061 297 


PF sign&l moved from Vpi_2 test specification to Vqli ■ PCN No. 

AC characteristics for battery-backed SDI pulse specification added. 


061 297 


PF Signal moved From V;, ;- lest specification; to ^o.i PCN No. (D725G2). 
AC characteristics for battery-backed SDi pulse specificalion added. 


051499 


Reduced absolute maximum voltage to Vcc + 0.5V. 

Added note clarifying storage temperature specification is for nonbatlery-bacfced stale. 

Deleted \bat specification (Duplicate of Li specification). 

Changed RRE mm (industrial temp range} from 40KC3 Ic 30Kn. 

Changed Vpfw max (industrial temp range) from 4.5V lo 4.6V, 

Added industrial specification for l L! . 

Reduced L-fhov and L-fhrv from 10ns to Oris. 


051499 


Reduced absolute maximum voltage to Vcc + 0.5V. 

Added note clarifying storage temperature specification is for nonbattery-backed state. 

Deleted Iba- specification (Duplicate of lu specification). 

Changed RRE mjn (industrial temp range) from 40kfi to 30kfi. 

Changed max (industrial temp range) from 4.5V lo 4.6V, 

Added industrial specificalion for In. 

Reduced L.- : FnrA> and L-Fnnv from 10ns to Oris. 


052599 


M ror revisions and approval. 


052555 


M ror r evis ors and aoo'ova'. 


062102 


Update Vr.^ariij I-loi specif cations to reflect 0 45V rtena vo tsce drop instesc of 0.35V. 


062102 


Updale Vcrjc-and l-coi specifications to reflect 0.45V internal voltage drop instead of 0.35V. 


100102 


Ordering information updated. 


100"02 


Ordering information updated. 


030403 


Reset Tn'p Point in Stop Mode (DC CSia racier:. s;ics) with GAT = 3.0V was changed to 3.3V [original issue 
was 3.3V). 


030403 


Reset Tn'ti Point m Stop Mode (DC Oiafaclyristicsi with BAT = 3.0V was ciiangad to 3.3V (original issue 
was 3.3V). 


070605 


Added Pb-frae part numbers to Ordering Information and Selector Guide. 

Added Operating Voltage specification. (This is not a new specification because operating voltage is implied 
in the testing limits, but rather a clarification.) 

Updated Absolute Maximum soldering temperature to reference JEDEC standard. 


070605 


Added Pb-free part numbers lo Ordering Information ami Selector Guide. 

Added Operating Vollage specification. (This is not a new specification because operating voltage is implied 
in the testing limits, but rather a clarification.) 

Updated AJbsolute Maximum soldering temperature to reference JEDEC standard. 


090805 


In the AC Characteristics — SD! Pin table, changed tspR MAX (in active mode) from 2^is to 1.3us. This 
change is only lo correct a documentation error, and does- no1 reflect a change in device operation or any 


090805 


In the j4C Characteristics-— SD! Pin table, changed tspR MAX (in active mode) from 2uis to 1.3j.ls. This 
Change is only lo Correct a documentation error, and does not reflect a Change in device operation or any 
charrae in testing. 


D72B06 


Removed products from Ordering Informalion table lhatdo not contain internal micro probe shields. 
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Copy/Paste corruption 

some files produced corrupted text when 
copying 

(mentioned in the first part) 
this is due to fonts: 

o /Subtype /Type3 

o with no /ToUnicode mapping 



Conclusion 



Conclusion 



• the PDF file format is awkward 

o not too complex if you just want to hide/reveal secrets 

• be careful when removing sensitive elements! 

o quite easy to check if elements are still removed or not 
o overlapping DOESN'T work 

• hiding and recovering elements is 'easy' 

o content is still there! 



Suggestions? 



I'm interested in: 

• hiding technics 

• automated revealing technics 

• documents that are a pain to 'rebuild' 

o split fonts in small paths ? 

o licensed fonts are converted to glyphs 
=> no more text 
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